Theory

My first two papers seem to indicate that people who spend time in areas with more continuous urban fabric are more likely to exhibit prosocial behaviors (valuing immigrants, rejecting the far-right). Using Mosquito Alert data, this analysis examines whether urban fabric is also associated with increased likelihood to participate in a citizen science project that benefits the community.

Data

I use all Mosquito Alert user reports, dating back to 2014, as an indicator of participation in the project. The coordinates of the reports are joined with Copernicus Urban Atlas data on the continuity of underlying urban fabric, census-tract-level data on the median income and population density of the underlying tract, and Mosquito Alert’s predicted probability of encountering a Tiger mosquito on the date of the report and in it’s underlying sampling cell. I limit the dataset to reports within the Barcelona metropolitan area, as defined by the Copernicus Urban Atlas.

# Read income and population density data
tracts <- st_read("CLEAN_tracts_lc.GEOjson", quiet = TRUE) %>% filter(NPRO == "Barcelona") %>% mutate(density = population/Shape_area) %>% select(NMUN,CUSEC,density,renta_persona)

# Read land cover data
lc <- st_read("ES002L2_BARCELONA/Shapefiles/ES002L2_BARCELONA_UA2012.shp", quiet = TRUE) %>% select(CODE2012, ITEM2012) %>% st_transform("+proj=longlat +ellps=WGS84 +datum=WGS84") %>% st_make_valid() %>% mutate(uf_continuity = ifelse(CODE2012 == "11100",.90, ifelse(CODE2012 == "11210",.65, ifelse(CODE2012 == "11220",.40, ifelse(CODE2012 == "11230",.20, ifelse(CODE2012 == "11240",.05,NA))))))

# Read mosquito encounter probability data
for (year in 2014:2021){
  temp <- readRDS(paste0("ma_predictions/MA_sampling_cell_predictions_daily_",year,".Rds"))
  if (year == 2014) {ma.preds <- temp} else {ma.preds <- bind_rows(ma.preds,temp)}
}

ma.preds <- ma.preds %>% select(sea_days,year,TigacellID,prob_median) %>% rename(mosq_prob = prob_median)

# Read Mosquito Alert report data and join all of the above
reports <- readRDS("mosquito_alert_cleaned_reports.Rds") %>% filter(!is.na(lon)) %>% st_as_sf(coords = c("lon", "lat"), crs = "+proj=longlat +ellps=WGS84 +datum=WGS84") %>% mutate(year = format(date, "%Y"), sea_days = as.numeric(date - as.Date(paste0(year,"-01-01")))) %>% select(date,user,unique_report_id,TigacellID,sea_days,year) %>% st_join(tracts) %>% na.omit() %>% st_join(lc) %>% left_join(ma.preds, by = c("sea_days","year","TigacellID")) %>% na.omit()

I treat the five categories of continuous/discontinuous urban fabric as continuous by assigning the median value of the proportions of soil sealing on which the categories are based. Reports outside of urban fabric are omitted. See table below.

lc.codes <- lc
st_geometry(lc.codes) <- NULL
lc.codes %>% select(ITEM2012,uf_continuity) %>% unique() %>% na.omit() %>% arrange(desc(uf_continuity)) 
##                                                       ITEM2012 uf_continuity
## 1                       Continuous urban fabric (S.L. : > 80%)          0.90
## 2         Discontinuous dense urban fabric (S.L. : 50% -  80%)          0.65
## 3 Discontinuous medium density urban fabric (S.L. : 30% - 50%)          0.40
## 4    Discontinuous low density urban fabric (S.L. : 10% - 30%)          0.20
## 5   Discontinuous very low density urban fabric (S.L. : < 10%)          0.05

Random Non-reports

To estimate the likelihood of making a Mosquito Alert report, given the underlying features of a location, I build a set of random non-reports. To do so, I generate random longitudes, latitudes, and dates, to which I join the same data as above and include only those that fall within urban fabric. See descriptives below for comparison between real reports and non-reports.

# Generate 200,000 random points within bounding box to ensure a sufficient number of points land within urban fabric
random.size <- 200000

cell.tracts <- reports %>% select(TigacellID,CUSEC)
st_geometry(cell.tracts) <- NULL
cell.tracts <- cell.tracts %>% unique() %>% group_by(CUSEC) %>% slice(1)

nonreports <- tibble(lon = runif(random.size,min = min(st_coordinates(reports)[,1]),max = max(st_coordinates(reports)[,1])), lat = runif(random.size,min = min(st_coordinates(reports)[,2]),max = max(st_coordinates(reports)[,2])), report = 0, num.reports = 0, sea_days = round(runif(random.size, 0, 365)), year = as.character(round(runif(random.size, 2014, 2021)))) %>% st_as_sf(coords = c("lon", "lat"), crs = "+proj=longlat +ellps=WGS84 +datum=WGS84") %>% st_join(tracts) %>% st_join(lc) %>% left_join(cell.tracts, by = "CUSEC") %>% left_join(ma.preds, by = c("sea_days","year","TigacellID")) %>% na.omit() %>% mutate(user = ifelse(report == 0, paste0("nonuser_",row_number()),user))

# Remove geometry after joining for faster analysis
nonreports.geo <- nonreports
st_geometry(nonreports) <- NULL
d <- reports
st_geometry(d) <- NULL

# Map
tm_shape(bind_rows(reports %>% mutate(type = "real report"), nonreports.geo %>% mutate(type = "random non-report"))) +
  tm_dots("type")
# Descriptives
d %>% mutate(type = "real report") %>% bind_rows(nonreports %>% mutate(type = "random non-report")) %>% group_by(type) %>% summarise(count = n(), uf_continuity = mean(uf_continuity), renta_persona = mean(renta_persona), density = mean(density), mosq_prob = mean(mosq_prob))
## # A tibble: 2 x 6
##   type              count uf_continuity renta_persona density mosq_prob
##   <chr>             <int>         <dbl>         <dbl>   <dbl>     <dbl>
## 1 random non-report  9493         0.475        16105. 0.00573    0.0929
## 2 real report        6096         0.682        16051. 0.0205     0.306

Models

I run two binomial models with the “report” variable as the DV and continuous urban fabric, median income, population density, and mosquito encounter probability as the IVs, along with municipality as a upper level.

In the first model, I compare the non-reports to each user’s first report–the moment they decided to participate. In the second model, I take the user’s mean values for each IV (and their most common municipality). Both models show a strong and significant positive relationship between urban fabric continuity and Mosquito Alert participation, as well as the expected positive relationships with population density (more potential users in these areas) and mosquito encounter probability.

# Model 1: Participation, user's first report
d1 <- d %>% arrange(date) %>% group_by(user) %>% slice(1) %>% mutate(report = 1) %>% bind_rows(nonreports)

summary(m1 <- glmer(report ~ scale(uf_continuity) + scale(renta_persona) + scale(density) + mosq_prob + (1|NMUN), d1, family = "binomial"))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## report ~ scale(uf_continuity) + scale(renta_persona) + scale(density) +  
##     mosq_prob + (1 | NMUN)
##    Data: d1
## 
##      AIC      BIC   logLik deviance df.resid 
##  10088.2  10132.6  -5038.1  10076.2    12085 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3490 -0.4271 -0.3095 -0.2360  4.6423 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  NMUN   (Intercept) 0.1646   0.4057  
## Number of obs: 12091, groups:  NMUN, 122
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -2.11237    0.05507 -38.360  < 2e-16 ***
## scale(uf_continuity)  0.36537    0.03446  10.602  < 2e-16 ***
## scale(renta_persona) -0.03779    0.02950  -1.281 0.200127    
## scale(density)        0.12873    0.03351   3.842 0.000122 ***
## mosq_prob             2.91887    0.11180  26.109  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) scl(f_) scl(r_) scl(d)
## scl(f_cntn) -0.065                       
## scl(rnt_pr)  0.142  0.139                
## scal(dnsty)  0.188 -0.341   0.253        
## mosq_prob   -0.290 -0.019  -0.048  -0.026
# Model 2: Participation, user means
user.muni <- d %>% group_by(user,NMUN) %>% count() %>% arrange(desc(n)) %>% slice(1) %>% select(user,NMUN)
d2 <- d %>% group_by(user) %>% summarise(uf_continuity = mean(uf_continuity), renta_persona = mean(renta_persona), density = mean(density), mosq_prob = mean(mosq_prob)) %>% mutate(report = 1) %>% left_join(user.muni, by = "user") %>% bind_rows(nonreports)

summary(m2 <- glmer(report ~ scale(uf_continuity) + scale(renta_persona) + scale(density) + mosq_prob + (1|NMUN), d2, family = "binomial"))
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## report ~ scale(uf_continuity) + scale(renta_persona) + scale(density) +  
##     mosq_prob + (1 | NMUN)
##    Data: d2
## 
##      AIC      BIC   logLik deviance df.resid 
##  10342.2  10386.7  -5165.1  10330.2    12218 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6582 -0.4355 -0.3133 -0.2368  4.6393 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  NMUN   (Intercept) 0.1525   0.3905  
## Number of obs: 12224, groups:  NMUN, 122
## 
## Fixed effects:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -2.08009    0.05370 -38.733  < 2e-16 ***
## scale(uf_continuity)  0.37865    0.03409  11.107  < 2e-16 ***
## scale(renta_persona) -0.02319    0.02916  -0.795    0.426    
## scale(density)        0.13424    0.03358   3.997 6.41e-05 ***
## mosq_prob             3.11170    0.11195  27.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) scl(f_) scl(r_) scl(d)
## scl(f_cntn) -0.074                       
## scl(rnt_pr)  0.141  0.135                
## scal(dnsty)  0.193 -0.352   0.252        
## mosq_prob   -0.309 -0.015  -0.048  -0.029

Next Steps